Chris Pollett > Old Classes >
CS267
( Print View )

Student Corner:
  [Grades Sec1]

  [Submit Sec1]

  [Class Sign Up Sec1]

  [Lecture Notes]
  [Discussion Board]

Course Info:
  [Texts & Links]
  [Topics/Outcomes]
  [Outcomes Matrix]
  [Grading]
  [HW/Quiz Info]
  [Exam Info]
  [Regrades]
  [Honesty]
  [Additional Policies]
  [Announcements]

HW Assignments:
  [Hw1] [Hw2] [Hw3]
  [Hw4] [Quizzes]

Practice Exams:
  [Mid 1] [Mid 2] [Final]

CS267 Fall 2011Practice Final

To study for the final I would suggest you: (1) Know how to do (by heart) all the practice problems. (2) Go over your notes at least three times. Second and third time try to see how much you can remember from the first time. (3) Go over the homework problems. (4) Try to create your own problems similar to the ones I have given and solve them. (5) Skim the relevant sections from the book. (6) If you want to study in groups, at this point you are ready to quiz each other. The practice final is below. Here are some facts about the actual final: (a) It is comprehensive (b) It is closed book, closed notes. Nothing will be permitted on your desk except your pen (pencil) and test. (c) You should bring photo ID. (d) There will be more than one version of the test. Each version will be of comparable difficulty. (e) It is 10 problems, 6 problems will be on material since the midterm, four problems will come from the topics covered prior to the midterm. (f) Two problems will be exactly (less typos) off of the practice final, and one will be off of practice midterm

Suppose in a document we see 1 occurrence of an a, 2 occurrences of a b, ... 26 occurrences of a z. Draw the Huffman tree one would get following the algorithm from class.
For each of the following give the distribution for which it is an optimal code: (a) unary code, gamma code, delta code.
Explain why it is reasonable to guess that `Delta`-values for posting lists follow a geometric distribution.
Briefly describe the REBUILD and REMERGE batch index operations.
What are some advantages and disadvantages of the IMMEDIATE MERGE versus NO MERGE index update strategies.
Describe how the Hybrid Index Maintenance system works.
Briefly explain the BM25F relevance measure. Briefly explain how pseudo-relevance feedback works.
Give the equations for the LMJM and LMD relevance measures. Do one example calculation with each.
Explain one method to estimate `P_1` and one method to estimate `P_2` in the divergence-from-randomness approach to coming up with a relevance measure.
Give a map reduce algorithm for coming up with the most common word in a collection of documents.